Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance
نویسندگان
چکیده
Summary We present Edlib, an open-source C/C ++ library for exact pairwise sequence alignment using edit distance. We compare Edlib to other libraries and show that it is the fastest while not lacking in functionality and can also easily handle very large sequences. Being easy to use, flexible, fast and low on memory usage, we expect it to be easily adopted as a building block for future bioinformatics tools. Availability and Implementation Source code, installation instructions and test data are freely available for download at https://github.com/Martinsos/edlib, under the MIT licence. Edlib is implemented in C/C ++ and supported on Linux, MS Windows, and Mac OS. Contact [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.
منابع مشابه
SlideSort: all pairs similarity search for short reads
MOTIVATION Recent progress in DNA sequencing technologies calls for fast and accurate algorithms that can evaluate sequence similarity for a huge amount of short reads. Searching similar pairs from a string pool is a fundamental process of de novo genome assembly, genome-wide alignment and other important analyses. RESULTS In this study, we designed and implemented an exact algorithm SlideSor...
متن کاملSequence analysis Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping
Motivation: Calculating the edit-distance (i.e. minimum number of insertions, deletions and substitutions) between short DNA sequences is the primary task performed by seed-and-extend based mappers, which compare billions of sequences. In practice, only sequence pairs with a small editdistance provide useful scientific data. However, the majority of sequence pairs analyzed by seedand-extend bas...
متن کاملShifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping
MOTIVATION Calculating the edit-distance (i.e. minimum number of insertions, deletions and substitutions) between short DNA sequences is the primary task performed by seed-and-extend based mappers, which compare billions of sequences. In practice, only sequence pairs with a small edit-distance provide useful scientific data. However, the majority of sequence pairs analyzed by seed-and-extend ba...
متن کاملFast Similarity Searches and Similarity Joins in Oracle DB
Similarity search and similarity join on strings are important operations for applications such as duplicate detection, error detection, data cleansing, or comparison of biological sequences [GIJ+01, NMS04]. Especially DNA sequencing produces large collections of erroneous strings which need to be searched, compared, and merged. In our talk, we will use ESTs as our running example. ESTs (Expres...
متن کاملA Systolic Array for the Sequence Alignment Problem
This report introduces a new systolic algorithm for the sequence alignment problem. This work builds upon an existing systolic array for computing the edit distance between two sequences. The alignment array is meant to be used as the second phase in a two-phase design with a modiied edit distance array serving as the rst phase. An implementation on the SPLASH programmable logic array is descri...
متن کامل